keywords: Breast cancer, Logistic Regression, Machine learning, Predictive modeling
Breast cancer remains one of the leading causes of death among women worldwide, with early diagnosis being crucial for effective treatment and survival. Traditional diagnostic methods such as mammography and biopsy, though effective, are often limited by human error and time constraints. Recent advances in machine learning (ML) have enabled the development of automated models for accurate and efficient cancer prediction. This study applies to Logistic Regression (LR) to predict breast cancer using clinical and histopathological datasets obtained from Kaggle and the University of Ilorin Teaching Hospital. The dataset was preprocessed through normalization, correlation analysis, and recursive feature elimination (RFE) to ensure data consistency and optimal feature selection. The data were divided into training (70%) and testing (30%) subsets. The model’s parameters were optimized using GridSearchCV, while evaluation metrics such as accuracy, precision, recall, F1-score, and Area Under the Curve (AUC) were employed to assess performance. The Logistic Regression model achieved an accuracy of 98.2%, precision of 96.9%, recall of 98.4%, and an F1-score of 97.6%. The Receiver Operating Characteristic (ROC) curve analysis confirmed a high discriminative capability with an AUC of 0.99, outperforming Support Vector Machine (SVM) and Decision Tree (DT) models under the same experimental conditions. The results validate Logistic Regression as a robust, interpretable, and computationally efficient model for breast cancer prediction. Its simplicity, transparency, and diagnostic accuracy make it suitable for deployment in clinical decision-support systems, particularly in low-resource settings.